Hybrid planner switching framework (HPSF) for autonomous driving needs to reconcile high-speed driving efficiency with safe maneuvering in dense traffic. Existing HPSF methods often fail to make reliable mode transitions or sustain efficient driving in congested environments, owing to heuristic scene recognition and low-frequency control updates. To address the limitation, this paper proposes LAP, a large language model (LLM) driven, adaptive planning method, which switches between high-speed driving in low-complexity scenes and precise driving in high-complexity scenes, enabling high qualities of trajectory generation through confined gaps. This is achieved by leveraging LLM for scene understanding and integrating its inference into the joint optimization of mode configuration and motion planning. The joint optimization is solved using tree-search model predictive control and alternating minimization. We implement LAP by Python in Robot Operating System (ROS). High-fidelity simulation results show that the proposed LAP outperforms other benchmarks in terms of both driving time and success rate.
We investigate how vibrotactile wrist feedback can enhance spatial guidance for handheld tool movement in optical see-through augmented reality (AR). While AR overlays are widely used to support surgical tasks, visual occlusion, lighting conditions, and interface ambiguity can compromise precision and confidence. To address these challenges, we designed a multimodal system combining AR visuals with a custom wrist-worn haptic device delivering directional and state-based cues. A formative study with experienced surgeons and residents identified key tool maneuvers and preferences for reference mappings, guiding our cue design. In a cue identification experiment (N=21), participants accurately recognized five vibration patterns under visual load, with higher recognition for full-actuator states than spatial direction cues. In a guidance task (N=27), participants using both AR and haptics achieved significantly higher spatial precision (5.8 mm) and usability (SUS = 88.1) than those using either modality alone, despite having modest increases in task time. Participants reported that haptic cues provided reassuring confirmation and reduced cognitive effort during alignment. Our results highlight the promise of integrating wrist-based haptics into AR systems for high-precision, visually complex tasks such as surgical guidance. We discuss design implications for multimodal interfaces supporting confident, efficient tool manipulation.




In mixed-traffic environments, where autonomous vehicles (AVs) interact with diverse human-driven vehicles (HVs), unpredictable intentions and heterogeneous behaviors make safe and efficient lane change maneuvers highly challenging. Existing methods often oversimplify these interactions by assuming uniform patterns. We propose an intention-driven lane change framework that integrates driving-style recognition, cooperation-aware decision-making, and coordinated motion planning. A deep learning classifier trained on the NGSIM dataset identifies human driving styles in real time. A cooperation score with intrinsic and interactive components estimates surrounding drivers' intentions and quantifies their willingness to cooperate with the ego vehicle. Decision-making combines behavior cloning with inverse reinforcement learning to determine whether a lane change should be initiated. For trajectory generation, model predictive control is integrated with IRL-based intention inference to produce collision-free and socially compliant maneuvers. Experiments show that the proposed model achieves 94.2\% accuracy and 94.3\% F1-score, outperforming rule-based and learning-based baselines by 4-15\% in lane change recognition. These results highlight the benefit of modeling inter-driver heterogeneity and demonstrate the potential of the framework to advance context-aware and human-like autonomous driving in complex traffic environments.




Pringle maneuver (PM) in laparoscopic liver resection aims to reduce blood loss and provide a clear surgical view by intermittently blocking blood inflow of the liver, whereas prolonged PM may cause ischemic injury. To comprehensively monitor this surgical procedure and provide timely warnings of ineffective and prolonged blocking, we suggest two complementary AI-assisted surgical monitoring tasks: workflow recognition and blocking effectiveness detection in liver resections. The former presents challenges in real-time capturing of short-term PM, while the latter involves the intraoperative discrimination of long-term liver ischemia states. To address these challenges, we meticulously collect a novel dataset, called PmLR50, consisting of 25,037 video frames covering various surgical phases from 50 laparoscopic liver resection procedures. Additionally, we develop an online baseline for PmLR50, termed PmNet. This model embraces Masked Temporal Encoding (MTE) and Compressed Sequence Modeling (CSM) for efficient short-term and long-term temporal information modeling, and embeds Contrastive Prototype Separation (CPS) to enhance action discrimination between similar intraoperative operations. Experimental results demonstrate that PmNet outperforms existing state-of-the-art surgical workflow recognition methods on the PmLR50 benchmark. Our research offers potential clinical applications for the laparoscopic liver surgery community. Source code and data will be publicly available.




Air traffic trajectory recognition has gained significant interest within the air traffic management community, particularly for fundamental tasks such as classification and clustering. This paper introduces Aircraft Trajectory Segmentation-based Contrastive Coding (ATSCC), a novel self-supervised time series representation learning framework designed to capture semantic information in air traffic trajectory data. The framework leverages the segmentable characteristic of trajectories and ensures consistency within the self-assigned segments. Intensive experiments were conducted on datasets from three different airports, totaling four datasets, comparing the learned representation's performance of downstream classification and clustering with other state-of-the-art representation learning techniques. The results show that ATSCC outperforms these methods by aligning with the labels defined by aeronautical procedures. ATSCC is adaptable to various airport configurations and scalable to incomplete trajectories. This research has expanded upon existing capabilities, achieving these improvements independently without predefined inputs such as airport configurations, maneuvering procedures, or labeled data.
In this paper, we have proposed a new strategy of using the landmark anchor node instead of a radio-based anchor node to obtain the virtual coordinates (landmarkID, DISTANCE) of moving troops or defense forces that will help in tracking and maneuvering the troops along a safe path within a GPS-denied battlefield environment. The proposed strategy implements landmark recognition using the Yolov5 model and landmark distance estimation using an efficient Stereo Matching Algorithm. We consider that a moving node carrying a low-power mobile device facilitated with a calibrated stereo vision camera that captures stereo images of a scene containing landmarks within the battlefield region whose locations are stored in an offline server residing within the device itself. We created a custom landmark image dataset called MSTLandmarkv1 with 34 landmark classes and another landmark stereo dataset of those 34 landmark instances called MSTLandmarkStereov1. We trained the YOLOv5 model with MSTLandmarkv1 dataset and achieved 0.95 mAP @ 0.5 IoU and 0.767 mAP @ [0.5: 0.95] IoU. We calculated the distance from a node to the landmark utilizing the bounding box coordinates and the depth map generated by the improved SGM algorithm using MSTLandmarkStereov1. The tuple of landmark IDs obtained from the detection result and the distances calculated by the SGM algorithm are stored as the virtual coordinates of a node. In future work, we will use these virtual coordinates to obtain the location of a node using an efficient trilateration algorithm and optimize the node position using the appropriate optimization method.




Multi-Agent Reinforcement Learning (MARL) has become a promising solution for constructing a multi-agent autonomous driving system (MADS) in complex and dense scenarios. But most methods consider agents acting selfishly, which leads to conflict behaviors. Some existing works incorporate the concept of social value orientation (SVO) to promote coordination, but they lack the knowledge of other agents' SVOs, resulting in conservative maneuvers. In this paper, we aim to tackle the mentioned problem by enabling the agents to understand other agents' SVOs. To accomplish this, we propose a two-stage system framework. Firstly, we train a policy by allowing the agents to share their ground truth SVOs to establish a coordinated traffic flow. Secondly, we develop a recognition network that estimates agents' SVOs and integrates it with the policy trained in the first stage. Experiments demonstrate that our developed method significantly improves the performance of the driving policy in MADS compared to two state-of-the-art MARL algorithms.




Ultrasound is a commonly used medical imaging modality that requires expert sonographers to manually maneuver the ultrasound probe based on the acquired image. Autonomous Robotic Ultrasound (A-RUS) is an appealing alternative to this manual procedure in order to reduce sonographers' workload. The key challenge to A-RUS is optimizing the ultrasound image quality for the region of interest across different patients. This requires knowledge of anatomy, recognition of error sources and precise probe position, orientation and pressure. Sample efficiency is important while optimizing these parameters associated with the robotized probe controller. Bayesian Optimization (BO), a sample-efficient optimization framework, has recently been applied to optimize the 2D motion of the probe. Nevertheless, further improvements are needed to improve the sample efficiency for high-dimensional control of the probe. We aim to overcome this problem by using a neural network to learn a low-dimensional kernel in BO, termed as Deep Kernel (DK). The neural network of DK is trained using probe and image data acquired during the procedure. The two image quality estimators are proposed that use a deep convolution neural network and provide real-time feedback to the BO. We validated our framework using these two feedback functions on three urinary bladder phantoms. We obtained over 50% increase in sample efficiency for 6D control of the robotized probe. Furthermore, our results indicate that this performance enhancement in BO is independent of the specific training dataset, demonstrating inter-patient adaptability.
The comprehensiveness of vehicle-to-everything (V2X) recognition enriches and holistically shapes the global Birds-Eye-View (BEV) perception, incorporating rich semantics and integrating driving scene information, thereby serving features of trajectory prediction, decision-making and driving planning. Utilizing V2X message sets to form BEV format proves to be an effective perception method for connected and automated vehicles (CAVs). Specifically, MAP, SPAT and RSI data contributes to the achievement of road connectivity, synchronized traffic signal navigation and obstacle warning. Moreover, using time-sequential BSMs information from multiple vehicles allows for the perception of current state and the prediction of future trajectories. Therefore, this paper develops a comprehensive autonomous driving model that relies on BEV-V2X perception, Interacting Multiple model Unscented Kalman Filter (IMM-UKF)-based trajectory prediction, and deep reinforcement learning (DRL)-based decision making and planing. We establish a DRL environment with reward-shaping methods to formulate a unified set of optimal driving behaviors that encompass obstacle avoidance, lane changes, overtaking, turning maneuver, and synchronized traffic signal navigation. Consequently, a complex traffic intersection scenario was simulated, and the well-trained model was applied for driving control. The observed driving behavior closely resembled that of an experienced driver, exhibiting anticipatory actions and revealing notable operational highlights of driving policy.
The rapid proliferation of non-cooperative spacecraft and space debris in orbit has precipitated a surging demand for on-orbit servicing and space debris removal at a scale that only autonomous missions can address, but the prerequisite autonomous navigation and flightpath planning to safely capture an unknown, non-cooperative, tumbling space object is an open problem. This requires algorithms for real-time, automated spacecraft feature recognition to pinpoint the locations of collision hazards (e.g. solar panels or antennas) and safe docking features (e.g. satellite bodies or thrusters) so safe, effective flightpaths can be planned. Prior work in this area reveals the performance of computer vision models are highly dependent on the training dataset and its coverage of scenarios visually similar to the real scenarios that occur in deployment. Hence, the algorithm may have degraded performance under certain lighting conditions even when the rendezvous maneuver conditions of the chaser to the target spacecraft are the same. This work delves into how humans perform these tasks through a survey of how aerospace engineering students experienced with spacecraft shapes and components recognize features of the three spacecraft: Landsat, Envisat, Anik, and the orbiter Mir. The survey reveals that the most common patterns in the human detection process were to consider the shape and texture of the features: antennas, solar panels, thrusters, and satellite bodies. This work introduces a novel algorithm SpaceYOLO, which fuses a state-of-the-art object detector YOLOv5 with a separate neural network based on these human-inspired decision processes exploiting shape and texture. Performance in autonomous spacecraft detection of SpaceYOLO is compared to ordinary YOLOv5 in hardware-in-the-loop experiments under different lighting and chaser maneuver conditions at the ORION Laboratory at Florida Tech.